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Efficient routing algorithms for multiple vehicles 
with no explicit communications 


Alessandro Arsie Ketan Savla Emilio Frazzoli 


Abstract 

In this paper we consider a class of dynamic vehicle routing problems, in which a number of mobile agents in the plane must 
visit target points generated over time by a stochastic process. It is desired to design motion coordination strategies in order to 
minimize the expected time between the appearance of a target point and the time it is visited by one of the agents. We propose 
control strategies that, while making minimal or no assumptions on communications between agents, provide the same level of 
steady-state performance achieved by the best known decentralized strategies. In other words, we demonstrate that inter-agent 
communication does not improve the efficiency of such systems, but merely affects the rate of convergence to the steady state. 
Furthermore, the proposed strategies do not rely on the knowledge of the details of the underlying stochastic process. Finally, we 
show that our proposed strategies yield an efficient, pure Nash equilibrium in a game theoretic formulation of the problem, in 
which each agent’s objective is to maximize the expected value of the “time spent alone” at the next target location. Simulation 
results are presented and discussed. 


I. Introduction 

A very active research area today addresses coordination of several mobile agents: groups of autonomous robots and large- 
scale mobile networks are being considered for a broad class of applications, ranging from environmental monitoring, to search 
and rescue operations, and national security. 

An area of particular interest is concerned with the generation of efficient cooperative strategies for several mobile agents to 

move through a certain number of given target points, possibly avoiding obstacles or threats [2]-[6]. Trajectory efficiency in 

these cases is understood in terms of cost for the agents: in other words, efficient trajectories minimize the total path length, the 

time needed to complete the task, or the fuel/energy expenditure. A related problem has been investigated as the Weapon-Target 
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Assignment (WTA) problem, in which mobile agents are allowed to team up in order to enhance the probability of a favorable 
outcome in a target engagement [7], [8]. In this setup, targets locations are known and an assignment strategy is sought that 
maximizes the global success rate. In a biological setting, the closest parallel to many of these problems is the development of 
foraging strategies, and of territorial vs. gregarious behaviors [9], in which individuals choose to identify and possibly defend 
a hunting ground. 

In this paper we consider a class of cooperative motion coordination problems, to which we can refer as dynamic vehicle 
routing , in which service requests are not known a priori, but are dynamically generated over time by a stochastic process 
in a geographic region of interest. Each service request is associated to a target point in the plane, and is fulfilled when one 
of a team of mobile agents visits that point. For example, service requests can be thought of as threats to be investigated in 
a surveillance application, events to be measured in an environmental monitoring scenario, and as information packets to be 
picked up and delivered to a user in a wireless sensor network. It is desired to design a control strategy for the mobile agents 
that provably minimizes the expected waiting time between the issuance of a service request and its fulfillment. In other words, 
our focus is on the quality of service as perceived by the “end user,” rather than, for example, fuel economies achieved by the 
mobile agents. Similar problems were also considered in [10], [11], and decentralized strategies were presented in [12]. This 
problem has connections to the Persistent Area Denial (PAD) and area coverage problems discussed, e.g., in [4], [13]—[15]. 

A common theme in cooperative control is the investigation of the effects of different communication and information 
sharing protocols on the system performance. Clearly, the ability to access more information at each single agent can not 
decrease the performance level; hence, it is commonly believed that by providing better communication among agents will 
improve the system’s performance. In this paper, we prove that there are certain dynamic vehicle routing problems which can, 
in fact, be solved (almost) optimally without any explicit communication between agents; in other words, the no-communication 
constraint in such cases is not binding, and does not limit the steady-state performance. The main contribution of this paper 
is the introduction of a motion coordination strategy that does not require any explicit communication between agents, while 
achieving provably optimal performance in certain conditions. 

The paper is structured as follows: in Section II we set up and formulate the problem we investigate in the paper. In Section 
III we introduce the proposed solution algorithms, and discuss their characteristics. Section IV is the technical core of the paper, 
in which we prove the convergence of the performance provided by the proposed algorithms to a critical point (either a local 
minimum or a saddle point) of the global performance function. Moreover, we show that any optimal configuration corresponds 
to a class of tessellations of the plane that we call Median Voronoi Tessellations. Section V is devoted to a game-theoretic 
interpretation of our result in which the agents are modeled as rational autonomous decision makers trying to maximize their 
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own utility function. We prove that, our algorithms yield an efficient, pure Nash equilibrium in a game theoretic formulation 
of the problem. In Section VI we present some numerical results, while Section VII is dedicated to final remarks and further 
extensions of this line of research. 


II. Problem Formulation 

Let O C R 2 be a convex domain on the plane, with non-empty interior; we will refer to fi as the workspace. A stochastic 
process generates service requests over time, which are associated to points in Q; these points are also called targets. The 
process generating service requests is modeled as a spatio-temporal Poisson point process, with temporal intensity A > 0, and 
an absolutely continuous spatial distribution described by the density function ip : O —> R + , with bounded and convex support 
within O (i.e., ip(q) > 0 45 q £ Q C O, with Q bounded and convex). The spatial density function ip is normalized in such a 
way that f n ip(q) dq = 1. Both A and ip are not necessarily known. 

A spatio-temporal Poisson point process is a collection of functions {V : M+ —> 2 n } such that, for any t > 0, V{t) is a 
random collection of points in 0, representing the service requests generated in the time interval [0, t), and such that 

• The total numbers of events generated in two disjoint time-space regions are independent random variables; 

• The total number of events occurring in an interval [s, s + t ) in a measurable set 5 C O that satisfies 

Pr [card ((P(» + t) - V(s)) n S) = k] = AA. ' AAl , Vk e N , 
where <p(S) is a shorthand for f s ip(q) dq. 

Each particular function V is a realization, or trajectory, of the Poisson point process. A consequence of the properties defining 
Poisson processes is that the expected number of targets generated in a measurable region SCO during a time interval of 
length At is given by: 

E[card {{V{t + At) — V{t)) n 5)] = AA t. ■ p{S). 

Without loss of generality, we will identify service requests with targets points, and label them in order of generation; in other 
words, given two targets ei,ej £ V(t), with i < j , the service request associated with these target have been issued at times 
ti <tj<t (since events are almost never generated concurrently, the inequalities are in fact strict almost surely). 

A service request is fulfilled when one of m mobile agents, modeled as point masses, moves to the target point associated 
with it; to is a possibly large, but finite number. Let p(t) = (pi{t),p 2 (t),... ,p m (t)) £ fl m be a vector describing the positions 
of the agents at time t. (We will tacitly use a similar notation throughout the paper). The agents are free to move, with bounded 
speed, within the workspace fl; without loss of generality, we will assume that the maximum speed is unitary. In other words. 
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the dynamics of the agents are described by differential equations of the form 

dp ^ = u i(t )» with ||'tq(i)|| < 1, Vt > 0,i e {1,... ,m}. (1) 

The agents are identical, and have unlimited range and target-servicing capability. 

Let Bi(t) C O indicate the set of targets serviced by the i-th agent up to time t. (By convention, £>,(0) = 0, i = 1,... ,m ). 
We will assume that £>, n Bj = 0 if i ^ j, i.e., that service requests are fulfilled by at most one agent. (In the unlikely event 
that two or more agents visit a target at the same time, the target is arbitrarily assigned to one of them). 

Let V : t —> 2 n indicate (a realization of) the stochastic process obtained by combining the service request generation 
process V and the removal process caused by the agents servicing outstanding requests; in other words, 

V(t) = V{t) U Bi(t) U ... U B m (t), V(t) n Bi{t) = 0, Mi g {1,... ,m}. 

The random set T>(t) C O represents the demand , i.e., the service requests outstanding at time t; let n(t) := card(2?(f)). 

Our objective in this paper will be the design of motion coordination strategies that allow the mobile agents to fulfill service 
requests efficiently (we will make this more precise in the following). In particular, in this paper we will concentrate on motion 
coordination strategies of the following two forms: 

7Tj : (pi,Bi,D) i-s- Ui, i € {1,... ,m}, (2) 

and 

Tr i :(p 1 ,...,Pm,Bi,'D)^u i , i £ (3) 

An agent executing a control policy of the form (2) relies on the knowledge of its own current position, on a record of 
targets it has previously visited, and on the current demand. In other words, such control policies do not need any explicit 
information exchange between agents; as such, we will refer to them as no communication (nc) policies. Such policies are 
trivially decentralized. 

Following the first control policy, it is certainly true that agent i must know the set T>(t) of outstanding targets, but this 
does not mean that this agent must know any of the sets Bj(t), for j ^ i. It is true that agents implicitly share/update the set 
of tasks that must be completed, but this does not mean that there is any communication among the agents. This is how the 
agents can have complete knowledge of V(t) without any communication among themselves (notice that it is possible to find 
out other ways to achieve a complete knowledge of T>(t) without communication, what follows is simply one of the possible 
ways). Indeed, suppose that the targets broadcast their service requests and their positions to the whole environment and they 
immediately stop broadcasting as soon as they are visited by one of the agents. Suppose also that each agent is able to receive 
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the signal broad-casted by each target and is able to detect the corresponding target position. In this set-up each agent knows 
precisely what are the outstanding targets, without exchanging directly any information with other agents. On the other hand, 
each agent seeing that a target has stopped broadcasting its service request can infer that that particular target has been visited. 
This is however not a communication among the agents, but simply an inference (and moreover the identity of the agent which 
has serviced that particular target is unknown to all other agents). Therefore the agents are not connected among them, they are 
instead connected with the targets, in the sense that they are able to receive and locate the signals broad-casted by all targets. 
The no-communication hypothesis is referred to the lack of direct communication among the agents, not to the inferences that 
each single agent can perform observing how the service requests of the outstanding targets are satisfied. 

Another possibility to get a complete knowledge of V(t) without explicit communication among the agents is as follows. 
Suppose there exists a satellite broadcasting the positions of all outstanding targets to the agents (the satellite simply broadcasts 
a list of available targets with their positions). Each time a target is serviced the satellite recognizes that that specific target 
is no more available (for instance because it has been destroyed) and drops it from the list of available targets. Each agent 
receives the same list from the satellite and also in this case, if a target disappear from the list, an agent which has not serviced 
that target can infer that it has been serviced by some other agent. 

On the other hand, an agent executing a control policy of the form (3) can sense the current position of other agents, but still 
has information only on the targets itself visited in the past. We call these sensor-based (sb) policies, to signify the fact that 
only factual information is exchanged between agents—as opposed to information related to intent and past history. Note that 
both families of coordination policies rely, in principle, on the knowledge of the locations of all outstanding targets. However 
only local target sensing will be necessary in practice. This last claim is difficult to justify from a theoretical point of view, 
and it is better understood in terms of simulations. For a more complete treatment of this issue, in particular the effect of 
limiting sensing radius for target sensing and its effect on the system performance, see the review [16]. 

A policy 7r = (7Ti, 7T2, • ■ •, 7r m ) is said to be stabilizing if, under its effect, the expected number of outstanding targets does 
not diverge over time, i.e., if 

n n = lim E[n(t)\\pi(t) = TT i (p(t),Bi(t), / D(t)),ie {l,...,m}\<oo. (4) 

t —>+oo 

Intuitively, a policy is stabilizing if the mobile agents are able to visit targets at a rate that is—on average—at least as fast as 
the rate at which new service requests are generated. 

Let Tj be the time elapsed between the issuance of the j-th service request, and the time it is fulfilled. If the system is 
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stable, then the following balance equation (also known as Little’s formula [17]) holds: 

n n = A TV, (5) 

where T v := Hindoo E [Tj\ is the system time under policy n, i.e., the expected time a service request must wait before being 
fulfilled, given that the mobile agents follow the strategy defined by n. Note that the system time 7V can be thought of as a 
measure of the quality of service, as perceived by the “user” issuing the service requests. 

At this point we can finally state our problem: we wish to devise a policy that is (i) stabilizing, and (ii) yields a quality of 
service (i.e., system time) achieving, or approximating, the theoretical optimal performance given by 

T op t = inf TV (6) 

7r stabilizing 

Centralized and decentralized strategies are known that optimize or approximate (6) in a variety of cases of interest [11], 
[12], [18], [19]. However, all such strategies rely either on a central authority with the ability to communicate to all agents, or 
on the exchange of certain information about each agent’s strategy with other neighboring agents. In addition, these policies 
require the knowledge of the spatial distribution ip; decentralized versions of these implement versions of Lloyd’s algorithm 
for vector quantization [20]. 

In the remainder of this paper, we will investigate how the additional constraints posed on the exchange of information 
between agents by the models (2) and (3) impact the achievable performance and quality of service. Remarkably, the policies 
we will present do not rely on the knowledge of the spatial distribution ip, and are a generalized version of MacQueen’s 
clustering algorithm [21], 


III. Control policy description 

In this section, we introduce two control policies of the forms, respectively, (2) and (3). An illustration of the two policies 
is given in Figure 1. 

A. A control policy requiring no explicit communication 

Let us begin with an informal description of a policy 7r nc requiring no explicit information exchange between agents. At 
any given time t, each agent computes its own control input according to the following rule: 

1) If V(t) is not empty, move towards the nearest outstanding target. 

2) If V(t) is empty, move towards the point minimizing the average distance to targets sen’iced in the past by each agent. 


If there is no unique minimizer, then move to the nearest one. 
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Fig. 1. An illustration of the two control policies proposed in Section III. While no targets are outstanding, vehicles wait at the point that minimizes 
the average distance to targets they have visited in the past; such points are depicted as squares, while targets are circles and vehicles triangles. In the 
no-communication policy, at the appearance of a new target, all vehicles pursue it (left). In the sensor-based policy, only the vehicle that is closest to the 
target will pursue it (right). 

In other words, we set 

Ttnc {pi{f),B i (f) 1 V(t)) = vers {F nc (pi(t),Bi(t),D(t)) - pt(t )), (7) 

where 

F nc (p i ,B l ,V) = < 

|| • || is the Euclidean norm, and 

vers(u) 

I 0 otherwise. 

The convex function W : q i—> ||</ — e||, often called the (discrete) Weber function in the facility location literature [22], 

[23] (modulo normalization by card(S)), is not strictly convex only when the point set B is empty—in which case we set 
W(-) = 0 by convention— or contains an even number of collinear points. In such cases, the minimizer nearest to p t in (8) 
is chosen. We will call the point p*(t) = F nc (-, Bft), 0) the reference point for the i- th agent at time t. 

In the 7 r nc policy, whenever one or more service requests are outstanding, all agents will be pursuing a target; in particular, 
when only one service request is outstanding, all agents will move towards it. When the demand queue is empty, agents will 
either (i) stop at the current location, if they have visited no targets yet, or (ii) move to their reference point, as determined 


argmin \\pi — 

-sll. 

if V ± 0, 

qev 

argmin > 

* 6n kL 

|e-g||, 

otherwise. 

[ v/\\v\\, 

if V ± 0, 



by the set of targets previously visited. 
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B. A sensor-based control policy 

The control strategy in the previous section can be modified to include information on the current position of other agents, if 
available (e.g., through on-board sensors). In order to present the new policy, indicate with V(p) = { Vi (p) , V 2 (p), ■ ■ •, V m (p)} 
the Voronoi partition of the workspace O, defined as: 

Vi(p) = {qefl: \\q-Pi\\ < \\q-pj\\,\/j = 1 ...m}. (9) 

As long as an agent has never visited any target, i.e., as long as £>j(f) = 0, it executes the 7r nc policy. Once an agent has 
visited at least one target, it computes its own control input according to the following rule: 

1) If 22(f) D Vi(t) is not empty, move towards the nearest outstanding target in the agent’s own Voronoi region. 

2) If 22(f) n Vi (f) is empty, move towards the point minimizing the average distance to targets in £>,(f). If there is no unique 
minimizer, then move to the nearest one. 

In other words, we set 

TTsb(p(f),#i(f),22(f)) = vers(F sb (p(f),S i (f),22(f)) -ft(f)), (10) 


where 

argmin \\pi — g||, if 22 ^ 0, and B t = 0 

q&V 


F sh (p,Bi, 22) 


arg min ||pi - g||, if 22 n Vi ^ 0, and B t ± 0 

q&vnVi(p) 


argmin \\e — g||, otherwise. 


( 11 ) 


In the 7r sb policy, at most one agent will be pursuing a given target, at any time after an initial transient that terminates 
when all agents have visited at least one target each. The agents’ behavior when no outstanding targets are available in their 
Voronoi region is similar to that determined by the 7r nc policy previously discussed, i.e., they move to their reference point, 
determined by previously visited targets. 

Remark 1: While we introduced Voronoi partitions in the definition of the control policy, the explicit computation of each 
agent’s Voronoi region is not necessary. In fact, each agent only needs to check whether it is the closest agent to a given 
target or not. In order to check whether a target point q is in the Voronoi region of the i-th agent, it is necessary to know the 
current position only of agents within a circle or radius \\pi — q\\ centered at q (see Figure 2). For example, if such circle is 
empty, then q is certainly in V,:; if the circle is not empty, distances of the agents within it to the target must be compared. 
This provides a degree of spatial decentralization—with respect to other agents—that is even stronger than that provided by 
restricting communications to agents sharing a boundary in a Voronoi partition (i.e., neighboring agents in the Delaunay graph. 


dual to the partition (9)). 
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Fig. 2. Implicit computation of Voronoi regions: Even though target e\ is the nearest target to pi, it is not in the Voronoi region of the 1st agent. In fact, 
the circle of radius ||ei — pi|| centered at e\ contains p 2 , and the 2nd agent is closer to e\. However, the circle of radius \\e 2 — pi|| centered at e2 does not 
contain any other agent, ensuring that e 2 is in the Voronoi region generated by p\. 

Remark 2: The sensor-based policy is more efficient than the no-communication policy in terms of the length of the path 
traveled by each agent, since there is no duplication of effort as several agents pursue the same target (unless a target is on the 
boundary of two or more Voronoi regions). However, in terms of “quality of service,” we will show that there is no difference 
between the two policies, for low target generation rates. Numerical results show that the sensor-based policy is more efficient 
in a broader range of target generation rates, and in fact provides almost optimal performance both in light and heavy load 
conditions. 


IV. Performance analysis in light load 

In this section we analyze the performance of the control policies proposed in the previous section. In particular, we 
concentrate our investigation on the light load case, in which the target generation rate is very small, i.e., as A —> 0 + . This 
will allow us to prove analytically certain interesting and perhaps surprising characteristics of the proposed policies. The 
performance analysis in the general case is more difficult; we will discuss the results of numerical investigation in Section VI, 
but no analytical results are available at this time. 

A. Overview of the system behavior in the light load regime 

Before starting a formal analysis, let us summarize the key characteristics of the agents’ behavior in light load, i.e., for small 
values of A. 

1) At the initial time the m agents are assumed to be deployed at general positions in Cl, and the demand queue is empty, 


V{0) = 0. 
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2) The agents do not move until the first service request appears. At that time, if the policy 7r nc is used, all agents will 
start moving towards the first target. If the sensor-based policy 7r s b is used, only the closest agent will move towards the 
target. 

3) As soon as one agent reaches the target, all agents start moving towards their current reference point, and the process 
continues. 

For small A, with high probability (i.e., with probability approaching 1 as A —> 0 + ) at most one service request is outstanding 
at any given time. In other words, new service requests are generated so rarely that most of the time agents will be able to 
reach a target and return to their reference point before a new service request is issued. This is just the content of the following. 

Proposition 3: Each agent will be able to return to its reference point before the generation of a new service request infinitely 
often with high probability as A —+ 0 + . 

Proof: Let ti be such that Bftf) 0, for all i £ {1,... ,?n}. Such a time exists, almost surely, basically because of 
the fairness of the proposed policies. At time t\ all agents will be within Q. Let n± = card(2?(ti)) be the total number of 
outstanding targets at time t\. An upper bound on the time needed to visit all targets in T>(tf) is ni(diam(<2)). When there 
are no outstanding targets, agents move to their reference points, reaching them in at most diam( Q) units of time. 

The time needed to service the initial targets and go to the reference configuration is hence bounded by t- m i < t \ + (m + 
l)diam(Q). The probability that at the end of this initial phase the number of targets is reduced to zero is 

P [n(fini) = 0] = exp(—A(fj n i - h)) > exp(-A(m + l)diam(Q)), 

that is, P [n(fi n i) = 0] —> 1“ as A —> 0 + . As a consequence, after an initial transient, all targets will be generated with all 
agents waiting at their reference points, and an empty demand queue, with high probability as A —> 0 + . ■ 

Consider the j-th service request, generated at time tj. Assuming that, at tj, all agents are at their reference positions, the 
expected system time E [Tj\ can be computed as 

E Pj] = [ . “in I \P*(tj) ~ 9ll V(q)dq. 

Assume for now that the sequences {p* (tj) : j 6 N} converge, and let 

lim p*(t j )=p*. 

J^+OO 

Note that p* is a random variable, the value of which depends in general on the particular realization of the target generation 
process. If all service requests are generated with the agents at their reference positions, the average service time (for small 
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A) can be evaluated as 

r. m 

T nc = T sh = / min \\p* - g|| p(q)dq = V / ll# _ tfll T>{o)dq- (12) 

Since the system time depends on the random variable p* = (j>*..... jx m ), it is itself a random variable. The function appearing 
on the right hand side of the above equation, relating the system time to the asymptotic location of reference points, is called 
the continuous multi-median function [22], This function admits a global minimum (in general not unique) for all non-singular 
density functions p, and in fact it is known [11] that the optimal performance in terms of system time is given by 

m r. 

T op t = min V / ||pi - g|| p(q)dq. (13) 

pen ir'iJvdP) 

In the following, we will investigate the convergence of the reference points as new targets are generated, in order to draw 
conclusions about the average system time T in light load. In particular, we will prove not only that the reference points 
converge with high probability (as A —> 0 + ) to a local critical point (more precisely, either local minima or saddle points) 
for the average system time, but also that the limiting reference points p* are generalized medians of their respective Voronoi 
regions, where 

Definition 4 (Generalized median): The generalized median of a set S C M” with respect to a density function p : S —> M+ 
is defined as 

P : = arg min / \\p - q\\p(q) dq. 

pen™ J s 

We call the resulting Voronoi tessellation Median Voronoi Tessellation (MVT for short), in analogy with what is done with 
Centroidal Voronoi Tessellations. A formal definition is as follows: 

Definition 5 (Median Voronoi Tessellation): A Voronoi tessellation V(p) = {Vj (p),..., V m (p)} of a set S C R" is called 
a Median Voronoi Tessellation of S with respect to the density function p if the ordered set of generators p is equal to the 
ordered set of generalized medians of the sets in V(p) with respect to p, i.e., if 

Pi = arg min/ \\s - q\\p(q) dq, Vi G {1,..., m}. 

S 6 R " JVi(p ) 

Since the proof builds on a number of intermediate results, we provide an outline of the argument as a convenience to the 
reader. 

1) First, we prove that the reference point of any agent that visits an unbounded number of targets over time converges 
almost surely. 

2) Second, we prove that, if m > 1 agents visit an unbounded number of targets over time, their reference points will 
converge to the generators of a MVT almost surely, as long as agents are able to return to their reference point infinitely 


often. 
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3) Third, we prove that all agents will visit an unbounded number of targets (this corresponds to a property of distributed 
algorithms that is often called fairness in computer science). 

4) Finally, we prove that agents are able to return to their reference points infinitely often with high probability as A —> 0 + . 
Combining these steps, together with (6), will allow us to state that the reference points converge to a local critical point of 
the system time, with high probability as A —■> 0 + . 

B. Convergence of reference points 
Let us consider an agent i, such that 

lim card (Bi(t)) = +oo, 

t —>-+oo 

i.e., an agent that services an unbounded number of requests over time. Since the number of agents m is finite, and the expected 
number of targets generated over a time interval [0, t) is proportional to t, at least one such agent will always exist. In the 
remainder of this section, we will drop the subscript i, since we will consider only this agent, effectively ignoring all others 
for the time being. 

For any finite t, the set Bit) will contain a finite number of points. Assuming that B(t) contains at least three non- 
collinear points, the discrete Weber function p i—> ||p — <?|| is strictly convex, and has a unique optimizer p*(t) = 

arg miiijjgo Y2 q eB(t) II P ~ <z||- The optimal point p*(t) is called the Fermat-Torricelli (FT) point—or the Weber point in the 
location optimization literature—associated with the set B(t)\ see [23]—[25] for a historical review of the problem and for 
solution algorithms. 

It is known that the FT point is unique and algebraic for any set of non-collinear points. While there are no general analytic 
solutions for the location of the FT point associated to more than 4 points, numerical solutions can be easily constructed 
relying on the convexity of the Weber function, and on the fact that it is differentiable for all points not in B. Polynomial-time 
approximation algorithms are also available (see, e.g., [26], [27]). Remarkably, a simple mechanical device can be constructed 
to solve the problem, based on the so-called Varignon frame, as follows. Holes are drilled on a horizontal board, at locations 
corresponding to the points in B. A string attached to a unit mass is passed through each of these holes, and all strings are 
tied together at one end. The point reached by the knot at equilibrium is a FT point for B. 

Some useful properties of FT points are summarized below. If there is a qo € B such that 

^2 vers ('?o - q) <1 (14) 

q£B\q 0 

then p* = qo is a FT point for B. If no point in B satisfies such condition, then the FT point p* can be found as a solution of 



13 


the following equation: 


vers (p* — q) = 0. 

q&B 


(15) 


In other words, p* is simply the point in the plane at which the sum of unit vectors starting from it and directed to each of 
the points in B is equal to zero; this point is unique if B contains non-collinear points. Clearly, the FT point is in the convex 
hull of B. 

Note that the points in Bit) are randomly sampled from an unknown absolutely continuous distribution, described by a 
spatial density function Cp —which is not necessarily the same as <p, and in general is time-varying, depending on the past 
actions of all agents in the system. Even though (jp is not known, it can be expressed as 

if q £ 1(f) 




(p{q,t) = \ 

0 otherwise, 

for some convex set 1(f) containing p(t). (In practical terms, such set will be the Voronoi region generated by p(f), in any 
case this remark is given as an aside and it is not important in the analysis of the proposed policies). 

The function f i— > p*(t) is piecewise constant, i.e., it changes value at the discrete time instants {tj : j £ N} at which the 
agent visits new targets. As a consequence, we can concentrate on the sequence {p*(tj) : j £ N}, and study its convergence. 
Definition 6: For any t > 0, let the solution set C(t) be defined as 


(7(f) := { p £ Q : 


< 1 


vers(p - q) 

| 968(f) 

An example of such set is shown in Figure 3. Before exploring the properties of these sets, let us define ej the target point 
corresponding to the j-th service request. The reason for introducing such solution sets is that they have quite remarkable 
properties as shown by the following result. 

Proposition 7: For any j £ N, p*(tj+i) £ C(tj). More specifically, if ej+ 1 ^ C(tj ) (i.e., the target point associated to the 
j-th service request is outside the solution set) then the FT point p*(tj+±) is on the boundary of C(tj). If e^+i £ C(tj), then 
P*(tj+1 ) = ej+i. 

Proof: If e,j + \ lies outside we search for p*(tj+i) as the solution of the equation 


vers(p — q) + vers (p — e J+ i) = 0, 

qeB(tj) 


from which it turns out immediately 


^ vers(p - q) 

q£ B{tj) 


||-vers(p- e j+ i)|| = 1, 
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Fig. 3. Example of a Fermat-Torricelli point (star) and solution set corresponding to five target points (circles). Upon the addition of an arbitrarily chosen 
sixth target point, the Fermat-Torricelli is guaranteed to remain within the region bounded by the curve. 

thus p*(tj+ 1 ) £ dC(tj). Notice that is is not true in general that the solution p*(tj+ 1 ) will lie on the line connecting p*(tj) 
with the new target Cj+i. In the other case, if e 3 + \ lies in C(tj), then it satisfies condition (14), and is the new FT point. ■ 
Now, in order to prove that the converges to a point p*, we will prove that the diameter of the solution set 

G(tj) vanishes almost surely as j tends to infinity. First we need the following result. 

Proposition 8: If Q = Supp(cp) is convex with non-empty interior, then p*(t) £ int(Q) almost surely, for all t such that 

Bit) f 0 . 

Proof: For any non-empty target set 13(f), the FT point lies within the convex hull of B(t). All points in the set B(t) are 
contained within the interior of Q with probability one, since the boundary of Q is a set of measure zero. Since int(Q) is 
convex, and B(t) C int(Q) almost surely, p*(t) £ co(B(t)) C int(Q), almost surely. ■ 

Proposition 9: If the support of ip is convex and bounded, 

lim diam(C(fj)) = 0, a.s. 

j —>+oo 

Proof: Consider a generic point p £ C(tj), and let 6 = p — p*(tj), a q = arccos [vers(p — p*(tj )) • vers (q — p*(tj))}, and 
a' q = arccos [vers(p — p*(tj)) ■ vers (q — p)\, see Figure 4. 

Since p £ C(tj), the magnitude of the sum of unit vectors E 9 eB(t ) vers (p ~ q) is no more than one, and the following 
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Fig. 4. Geometric constructions in the proof of Proposition 9. 


inequality is true: 


^ vers(p - q) 

•vers(p-p*(fj)) 

— 

^ (vers(p - q) • vers (p - p*(tj ))) 

= 

cos (a' q ) 

\q&B(tj) J 



q£B(tj) 


q&Bitj) 


<1- 06) 


Using elementary planar geometry, we obtain that 


a' - a q > sin (a' - a q ) = 


5 sin(a 9 ) 


-q ^q) — II _ " ■ 


Pick a small angle 0 < a m j n < 7r/2, and let 

Atj) = {?€ Bitj) : sin(a 9 ) > sin(a min )}. 

(In other words, B arnin (t) contains all points in B that are not in a conical region of half-width o ln j ri , as shown in Figure 4). 

For all q G B(tj), cos(a' q ) < cos(a g ); moreover, for all q G S amin , 

/ \ ./ w / i \ 3 sin(o! m i I1 ) 5sin(a m i n ) 

cos(a q ) < cos{a q ) — sm(a m i n )(a ? — a q ) < cos(a q ) ----— < cos(a 9 ) — 


\q-p II 


diam(Q) + 6 


Hence, summing over all q G B(tj), we get: 


cos(a' g ) < cos (a q ) - 


q£B(tj) 


qeB(tj) 


Observe now that in any case 


9 e8 °mi„<U) 


< 1, 


hsin(a min ) 2 
diam(Q) + <5 


(17) 


cos(a g ) 

(it is zero in case p*(tj) (j B(tj), and bounded in absolute value by one if p*{tj) G B(tj)). Therefore, rearranging equation 


(17): 


card(ff amin (t,)) C0S K)' C0S K)< 2 


diam(Q) + S 


Solving this inequality with respect to S we get: 

<5 < 


qGB(tj) 


2 diam( Q) 


qeB(tj) 


card (B amitt (tj)) sin(a min ) 2 - 2' 


(18) 
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Since (i) a m i n is a positive constant, (ii) p*(tj) is in the interior of Q, and (iii) linij^oo card (B(tj)) = +oo, the right hand 
side of (18) converges to zero with probability one. Since the bound holds for all points p £ C(tj), for all j £ N, the claim 
follows. ■ 

In the previous proposition, we have proven that \\p*(tj+i) — p*(tf )| tends to zero a.s. as j —> oo, under some natural 
assumptions on the distribution ip and its support. Unfortunately this is not sufficient to prove that the sequence 
is Cauchy; convergence of the sequence is however ensured by the following 
Proposition 10: The sequence {p*(tj)}j e ^ converges almost surely. 

Proof: Since the sequence {p*(tj)}j & n takes value in a compact set (the closure of Supp(</?)), by the Bolzano-Weirstrass 
theorem there exists a subsequence converging to a limit point p* in the compact set. Construct from a maximal 

subsequence converging to p *, and call J the set of indices of this maximal subsequence. If the original sequence {p*(tj)}j e n 
is not converging to p *, then there exists an L > 0 such that \\p*(tj) — p*|| > L, for any j £ N \ J, and this set of indices 
is unbounded. Take e := L/ 3 > 0. We have that |b*(4-i) — p*\\ < e for any sufficiently large (k — 1) £ J; moreover, 
\\p*(tk~i) — p*(tk )|| < e, a.s. by Proposition (9). Choose a sufficiently large (k — 1) £ J, such that k £ N — J (this is always 
possible since the complementary set of J is unbounded by the assumption of non-convergence). But now, we have 

L < ||p*(4) - Pi < lb*(4) -p*(4-t)|| + Ib*(4-i) -ni < \l, 

which is a contradiction. ■ 

C. Convergence to the generalized median 

By the discussion in the previous section, we know that the reference points of all agents that visit an unbounded number 
of targets converge to a well-defined limit. So do, trivially, the reference points of all agents that visit a bounded set of targets. 
Hence, we know that the sequence of reference points p*(tj) converges to a limit p*, almost surely for all i £ {1,..., m}. Let 
us denote by Vi(p*(tj)) the Voronoi region associated to the generator p* ( 4 ), and by Vfp*) the Voronoi region corresponding 
to the limit point p*. 

Proposition 11: If the limit reference points p* = (pi ,... ,p^) are distinct, then the sequence of Voronoi partitions {V(p*(tj))}j & ^ 
converges to the Voronoi partition generated by the limit of reference points, i.e., 

lim Vi(p*(tj)) = V.fp*), a.s. 

j—> 00 

Proof: The boundaries of regions in a Voronoi partition are algebraic curves that depend continuously on the generators, 
as long as these are distinct. Hence, under this assumption, almost sure convergence of the generators implies the almost sure 


convergence of the Voronoi regions. 
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As a next step, we wish to understand what is the relation between the asymptotic reference positions and their associated 
Voronoi regions. More precisely, let A C {1,... ,m} be the subset of indices of agents that visit an unbounded number of 
targets; we want to prove that p* is indeed the generalized median p, associated to agent i, with respect to the limiting set 
Vj(p*) and distribution <p(x), Vi G A. First we need the following technical result. 

Lemma 12: Let { f, } t p!i : Q —> M. be a sequence of strictly convex continuous functions, defined on a common compact 
subset Q C . Assume that each /,; has a unique X{ := arg min x f belonging to the interior of fl for any i and that this 
sequence of function converges uniformly to a continuous strictly convex function / admitting a unique minimum point x 
belonging also to the interior of fi. Then lim,^^ x t = x. 

Proof: Since {fifieN converges uniformly to /, then for any e > 0, there exists an /(e) such that for any i > m, 
|| fi — /|| < e uniformly in x £ fi. Let m := f(x), the minimum value achieved by /. Consider the set 

U e := {x € 0 such that f(x) < m + 2e}. 

Since / is strictly convex, for e sufficiently small, U e will be a compact subset contained in A. Moreover, since / is strictly 
convex, we have that U f is strictly included in Up, whenever e' > e and both are sufficiently small. It is also clear that 
lim^oo U e = x (nested strictly decreasing sequence of compact subsets all containing the point x). If ||/,; — /|| < e, we claim 
that x-i £ U e ; we prove this by contradiction. Since /,; > f — e, if Xi does not belong to U e , then iriinf/,) = ffxf) > m + e 
(this is just because U e is simply the set where the function (/ — e) < m + e). But since f t < f + e it turns out that 
fi(x) < m + e < min(/j), which is a contradiction. ■ 

We conclude this section with the following: 

Proposition 13: Assume that all agents in A are able to return infinitely often to their reference point between visiting two 
targets. Then, the limit reference points of such agents coincide, almost surely, with the generalized medians of their limit 
Voronoi regions, i.e., 

p* = arg min / \\p — q\\ip{q)dq, a.s., Vi € A. 

P^ n JVi(p-) 

Proof: For any i G A, define the functions f i>tj (p) ~ j lb - q\\ and ffp) := / v . (j3 „) ||p - q\\p(q) dq. These 

functions are continuous and well defined over O. We restrict their domains of definition to the compact set Q = Suppf^). 
These functions are also strictly convex and have unique minima in the interior of Q. Let us notice that, with our previous 
notations, we have that p*(tj) = argmin/ i)t .(p) and p i = argmin/j(p). Observe that the functions / i t .(p) and fi(p) can be 
considered random variables with respect to a probability space whose space of events coincide with all possible realizations 
of target sequences. Consider a restriction of these random variables to a new probability space whose space of events coincide 
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with all possible realizations of target sequences, for which the corresponding FT points converge to a limiting point. On this 
new probability space the random variable f, (p) becomes a deterministic function which is the expected value of the random 
variables /,, t . (p). Since Q is compact, it is immediate to see that fi,tj{p) has finite expectation and variance over this reduced 
probability space, and by the Strong Law of Large Numbers we can conclude that almost surely (over this reduced probability 
space) fi t tj(p) converge pointwise to /, (p). To show that f l t (p) converges pointwise to /,(p) over the original probability 
space, it is sufficient to observe the following. The original probability space is the probability space whose space of events 
coincide with all possible realizations of target sequences. We already know that almost surely the FT points associated to 
any possible realization of target sequences will converge. So we can fiber the space of events of the first probability space 
into spaces of events of reduced probability spaces, except for a set of measure zero. This is sufficient to prove that /) it (p) 
converge pointwise to fi(p) almost surely with respect to all possible realizations of target sequences. 

Now that we have proved that almost surely the sequence (p)}t eR converges pointwise to fi(p), we prove that it does 
converge uniformly. To do this, we use a theorem from [28], which is usually devoted to Dini-Arzela. The theorem states the 
following: an equicontinuous sequence of functions converges uniformly to a continuous function on a compact set Q if and 
only if it converges pointwise to a continuous function on the same compact set. Our sequence fi >t . (p) is equicontinuous if 
Ve > 0 and Vp £ Q there exists a <5 > 0 such that for all j £ {1,..., ?r} and for all p' £ Q with \\p' — p\\ < S , we have 
|| fi,tAp) — fi,tj(p')\\ < e; observe that 6 is independent on j, while in general it will depend on e and on p. Now we have 


Whtjip) -/i,t,V)ll < llb-tfll-lb'-gill- 

J q&Biitj) 


Using 


h~p\\ = h~p' + p’-p\\ < h~p'\\ + \\p-p'\\, 


it is immediate to see that 

Wkt^p) - fi,t 0 (p')\\ < - Wp-p’W ^ Wp-p'W- 

J qeBiitj) 

So it is sufficient to take <5 = e in the previous definition and <5 does not depend on j. So the sequence is equicontinuous and 
the pointwise convergence is upgraded to uniform convergence. 

We already know that almost surely the points p*(tj) do converge to points p* (Proposition (10)); therefore we can claim 
that p* = p,j, simply applying Lemma (12), which requires the uniform convergence. Thus we can claim that the reference 
position of each agent which services infinitely many targets following our algorithm converges to the generalized median of 


its Voronoi region, almost surely. 
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D. Fairness 

In this section, we prove that, as long as ip is strictly positive over a convex set, both policies introduced in Section III are 
fair. 

Proposition 14 (Fairness): If Q = Supp(</?) is convex, all agents eventually visit an unbounded number of targets, almost 
surely, i.e., 

lim card(Sj(f)) = +oo, a.s., Vi € {1,... , to}. 

t —>-+oo 

Proof: Under either policy, each agent will pursue and is guaranteed to service the nearest target in its own Voronoi region. 
Hence, in order to show that an agent services an unbounded number of targets, it is sufficient to show that the probability 
that the next target be generated within its Voronoi region remains strictly positive, i.e., that 

/ <p{q)dq > 0. 

Since Q is convex, and all agents move towards the nearest target, at least initially, all agents will eventually enter Q, and remain 
within it. Let us denote with P t j the probability that agent i visits the j- th target. For any i this probability is always strictly 
positive. Indeed, even if it happens that some agents are servicing simultaneously the same target (simply because the service 
request appears at the boundary of two different Voronoi regions), this does not mean that their reference points have to coincide. 
For the reference points of two agents eventually to coincide, it must happen that they are servicing infinitely many often and 
simultaneously the same target. Since the boundaries of Voronoi regions have measure zero and <p is a continuous distribution 
without singular components, we can claim that liirij^ +OC! P, 3 > 0 almost surely. Now call P = minj = i v .. )TO limj^ +(X) Pij. 
Then P is strictly positive almost surely. Therefore, the probability that the z-th agent does not visit an unbounded number of 
targets is bounded from above by linij ^ +00 — P) k = 0, almost surely. ■ 


E. Efficiency 


In this section, we will prove that the system time provided by both the algorithms in Section III converges to a critical 
point (either a saddle point or a local minimum) with high probability as A —■> 0 + . 

In the preceding sections, we have proved that—as long as each agent is able to return to its reference point between 
servicing two targets, infinitely often—the reference points p*(tj ) converge to points p*, which generate a MVT. In such case, 
we know that the average time of service will converge to 


T 


7T 



171 r. 

<?l| p(q)dq = II p* - q\\ <p{q)dq- 

i=i J ^(p*) 


(19) 
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Consider now functions 7Y m of the form: 

P m p 

Hmipi, ■ ■ ■ ,Pm) = / min lb* - q\\ <p(q)dq = V / lift - g|| p(q)dq. (20) 

J n *=!,...,m ~iJVi(p) 

Observe that 7V belongs to the class of functions of the form 77,„ where each point p, is constrained to be the generalized 
median of the corresponding Voronoi region (i.e., V(p) is a MVT). 

We want to prove that 7 _ is a critical point of 77 m . To do so, we consider an extension of 77 m , i.e. a functional /C m defined 
as follows: 

m p 

i=l 

Observe that in this case the regions are not restricted to form a MVT with respect to the generators {ft};=i,...,m. 

Thus we can view the functional 77 m we are interested in as a constrained form of the unconstrained functional /C m . It turns 
out therefore that critical points of /C m are also critical points of 77 m . With respect to critical points of /C m we have the 
following result: 

Proposition 15: Let {ft}j=i,... > m denote any set of m points belonging to Supp(<p) and let {Vi}i=i,... lTO denote any tessella¬ 
tion of Supp(<p) into to regions. Moreover, let us define IC m as above. Then a sufficient condition for {pi,...,p . m , Vi,..., V m } 
to be a critical point (either a saddle point or a local minimum), is that the Vi’s are the Voronoi regions corresponding to the 
Pi s, and, simultaneously, the pfs are the generalized median of the corresponding Vi’s. 

Proof: Consider first the variation of K, m with respect to a single point, say pi. Now let v be a vector in R 2 , such that 
Pi + ev £ fl. Then we have 


(ft + ev) - /C m (pi) 



{||y-ft 


ew ll - \\y-Pi\W p{y)dy , 


where we have not listed the other variables on which IC m depends since they remain constant in this variation. By the very 
form of this variation, it is clear that if the point p t is the generalized median for th e fixed region V,, we will have that /C m (ft + 
ev) — Kl m (pi) > 0, for any v. Now consider the points {ft } i= i m fixed and consider a tessellation {Z7j}i = i m different 
from the Voronoi regions {V*}*=i,..., TO generated by the points pfs. We compare the value of IC m (pi,... ,p m , Vi,..., V m ), 
with the value of IC m (pi,... ,p m ,lli, ■ ■. ,Um)- Consider those y which belong to the Voronoi region V, generated by Pj , and 
possibly not to the Voronoi region of another p,. Anyway, since U, is not a Voronoi tessellation, it can happen that in any case 
these y belong to Hi. Thus for these particular y’s we have p(y)\[y — Pj\\ < <p(y)\\y — Pi ||. Moreover, since are 

not the Voronoi tessellation associated to the pfs, the last inequality must be strict over some set of positive measure. Thus 
we have that JC m (j>i, ■ ■ ■ ,Pm , Vi,..., V m ) < ■ •. ,p m Mi, ■ ■ ■ Mm )» an d therefore /C m is minimized, keeping fixed the 


pfs exactly when the subset Vi’s are chosen to be the Voronoi regions associated with the point pf s. 
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By the previous proposition and by the fact that critical points of the unconstrained functional K. m are also critical points 
of the constrained functional TL m , we have that the MVT are always critical points for the functional TL rn , and in particular 
7L is either a saddle point or a local minimum for the functional 'H m . 

We can now conclude with following: 

Theorem 16 (Efficiency): The system time provided by the no-communication policy 7r nc and by the sensor-based policy 
7r s b converges to a critical point (either a saddle point or a local minimum) with high probability as A —» 0 + . 

Proof: Combining results in Propositions 10 and 13 we conclude that the reference points of all agents that visit an 
unbounded number of targets converge to a MVT, almost surely—provided agents can return to the reference point between 
visiting targets. Moreover, the fairness result in Proposition 14 shows that in fact all agents do visit an unbounded number of 
targets almost surely; as a consequence of Proposition 15 the limit configuration is indeed a critical point for the system time. 
Since agents return infinitely often to their reference positions with high probability as A —> 0, the claim is proven. ■ 

Thus we have proved that the suggested algorithm enables the agents to realize a coordinated task, such that “minimizing” 
the cost function without explicit communication, or with mutual position knowledge only. Let us underline that, in general, 
the achieved critical point strictly depends on the initial positions of the agents inside the environment O. It is known that the 
function Tt m admits (not unique, in general) global minima, but the problem to find them is NP-hard. 

Remark 17: We can not exclude that the algorithm so designed will converge indeed to a saddle point instead of a local 
minimum. This is due to the fact that the algorithm provides a sort of implementation of the steepest descent method, where, 
unfortunately we are not following the steepest direction of the gradient of the function TL m , but just the gradient with respect 
to one of the variables. For a broader point of view of steepest descent in this framework see for instance [29]. 

On the other hand, since the algorithm is based on a sequence of targets and at each phase we are trying to minimize a 
different cost function, it can be proved that the critical points reached by this algorithm are no worse than the critical points 
reached knowing a priori the distribution <p. This is a remarkable result proved in a different context in [30], where an example 
is presented in which the use of a sample sequence provides a better result (with probability one) than the a priori knowledge 
of ip. In that specific example the algorithm with the sample sequence does converge to a global minimum, while the algorithm 
based on the a priori knowledge of the distribution ip gets stuck in a saddle point. 

F. A comparison with algorithms for vector quantization and centroidal Voronoi tessellations 

The use of Voronoi tessellations is ubiquitous in many fields of science, ranging from operations research, animal ethology 
(territorial behaviour of animals), computer science (design of algorithms), to numerical analysis (construction of adaptive grids 
for PDEs and general quadrature rules), and algebraic geometry (moduli spaces of abelian varieties). For a detailed account of 
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possible applications see for instance the book [29]. In the available literature, most of the analysis is devoted to applications 
of centroidal Voronoi tessellations, i.e., Voronoi tessellations such that 


Pi = argmin / \\s - q\\ 2 <p(q) dq, Vi € {1,..., m}. 

A popular algorithm due to Lloyd [20] is based on the iterative computation of centroidal Voronoi tessellations. The algorithm 
can be summarized as follows. Pick in generator points, and consider a large number n of samples from a certain distribution. 
At each step of the algorithm generators are moved towards the centroid of the samples inside their respective Voronoi region. 
The algorithm terminates when each generator is within a given tolerance from the centroid of samples in its region, thus 
obtaining a centroidal Voronoi tessellation weighted by the sample distribution. There is also a continuous version of the 
algorithm, which requires the a priori knowledge of a spatial density function, and computation of the gradient of the polar 
moments of the Voronoi regions with respect to the positions of the generators. An application to coverage problems in robotics 
and sensor networks of Lloyd’s algorithm is available in [14], 

The algorithms we introduced in this paper are more closely related to an algorithm due to MacQueen [21], originally designed 
as a simple online adaptive algorithm to solve clustering problems, and later used as the method of choice in several vector 
quantization problems where little information about the underlying geometric structure is available. MacQueen’s algorithm 
can be summarized as follows. Pick m generator points. Then iteratively sample points according to the probability density 
function ip. Assign the sampled point to the nearest generator, and update the latter by moving it in the direction of the sample. 
In other words, let qj be te j-th sample, let i*(j) be the index of the nearest generator, and let c = (ci,..., c m ) be a counter 
vector, initialized to a vector of ones. The update rule takes the form 


Pi * (j) 


c i*(j)Qj Pi*U) 
C i* 0 ) + 1 


C i* (j) C i*(j) + I' 


The process is iterated until some termination criterion is reached. Compared to Lloyd’s algorithm, MacQueen’s algorithm has 
the advantage to be a learning adaptive algorithm, not requiring the a priori knowledge of the distribution of the objects, but 
rather allowing the online generation of samples. It can be recognized that the update rule in MacQueen’s algorithm corresponds 
to moving the generator points to the centroids of the samples assigned to them. 

The algorithm we propose is very similar in spirit to MacQueen’s algorithm, however, there is a remarkable difference. 
MacQueen’s algorithm deals with centroidal Voronoi tessellations, thus with the computation of fc-means. Our algorithm instead 
is based on MVT, and on the computation of /.'-medians. In general, very little is known for Voronoi diagrams generated using 
simply the Euclidean distance instead of the square of the Euclidean distance. For instance, the medians of a sequence of 
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points can exhibit a quite peculiar behavior if compared to the one of the means. Consider the following example. Given a 
sequence of points in a compact set K C R 2 , we can construct the induced sequence of means: 

1 N 


m N 


i =1 


and analogously the induce sequence of FT points we considered in the previous sections. Call FT\; the FT point corresponding 
to the first N points of the sequence {gj},gN- We want to point out that induced sequence {rrij} and {FTj} have a very 
different behaviour. Indeed, the induced sequence of means will always converge as long as the points qj s belong to a compact 
set. To see this, just observe that if diam(iv) < L, then ||TOj|| < L. Moreover, it is immediate to see that ||tojv+i — rrijv|| < 7^. 
Then one can conclude using the same argument of Proposition 10. On the other hand, one can construct a special sequence 
of points (jj s in a compact set K for which the induced sequence of FT points does not converge. This is essentially due to 
the fact that while the contribution of each single point q 7 in moving the position of the mean decreases as j increases, this 
could not happen in the case of the median. To give a simple example, start with the following configuration of points in 
R 2 : q\ = (1,0), <72 = (—1,0), (73 = (0,1) and <74 = (0,-1). Then the sequence of points qj s continue in the following way: 
ft = (0,1) if k > 4, and k odd, q^ = (0, —1) if k > 4 and k is even. Using the characterization of FT points, it is immediate 
to see that FTk = (0,0) for k > 4 and k even, while FTf. = (0, tan( 7 r/ 6 )) for k > 4 and k odd, so the induced sequence 
can not converge. This phenomenon can not happen to the sequence of means, which is instead always convergent. Therefore, 
it should be clear that the use of MVT instead of centroidal Voronoi tessellations makes much more difficult to deal with the 
technical aspects of the algorithm such as its convergence. 


V. A GAME-THEORETIC POINT OF VIEW 

In previous sections, we proposed two policies, 7t s b and 7r nc , which yield the p* reference points under light load. A key 
component of the two policies was the update rule for the reference points. In this section we provide a game-theoretic 
interpretation of the p* reference points. In particular, we frame our presentation along the works [ 8 ], [31], in which game- 
theoretic point of view has been introduced in the study of cooperative control and strategic coordination of decentralized 
networks of multi-agents systems. We prove that the p* reference points, under light load conditions, are an efficient pure Nash 
equilibria in a multi-player game setting where each agent is interested in maximizing its own utility. It turns out that, by just 
trying to maximize their own utility function, the agents will indeed maximize a common global utility function. The update 
rule for the reference point proposed as part of the two policies in the earlier sections can then be regarded as a mechanism 
for the vehicles to learn the p* strategy. 

We first formulate the game in the strategic form [32] as follows. Consider a scenario with the same workspace Cl, density 
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function ip with support Q and the same stochastic process for generating service requests as described in the previous sections. 
We replace the terms service requests and targets with resources to fit better in the context of this section. The players of the 
game are the m vehicles. The vehicles are assumed to be rational autonomous decision makers trying to maximize their own 
utility function. The resources offer rewards in a continuous fashion and the vehicles can collect these rewards by traveling to 
the resource locations. Every resource offers reward at a rate, which depends on the number of vehicles present at its location: 
the reward rate is unity when there is one vehicle and it is zero when there are more than one vehicles. Moreover, the life of 
the resource ends as soon as more than one vehicles are present at its location. The sensing and communication capabilities of 
the vehicles are as follows: (i) the location of a resource is broadcast to all the vehicles as soon as it is generated; similarly, 
every vehicle is notified as soon as a resource ceases to exist, and (ii) there is no explicit communication between the vehicles, 
i.e., the vehicles do not have knowledge about each other’s position and no vehicle knows the identities of other vehicles 
visiting any resource location. 

This setup can be understood to be an extreme form of congestion game [33], where the resource cannot be shared between 
agents and that the resource is cut off at the first attempt to share it. The total reward for vehicle i from a particular resource 
is the time difference between its arrival and the arrival of the next vehicle, if i is the first vehicle to reach the location of the 
resource, and zero otherwise. (Note that, since a vehicle cannot communicate with any other vehicle, it cannot determine if it 
will be the first one to reach the resource location when the location is broadcast to it). 

We focus our analysis on the light load case here, i.e., when A —> 0 + . Moreover, we assume that all the vehicles are present 
in Q at time t = 0 and that there are no resources at time t 0. Hence, there will be utmost one active resource at any 
time almost surely. Note that these assumptions on the initial conditions are without any loss of generality in light of the 
discussion in the earlier sections. Since our focus is on the light load scenario, we let the strategy space of agent i to be Q 
for all i £ {1,... ,m}. Specifically, the strategy of an agent i denoted as 7r,; is identified with a reference point in Q which 
the agent approaches in the absence of outstanding service requests. On the generation of a resource, the agents move directly 
towards its location and return back to the reference location once the resource ceases to be active. We will use 7r_i £ Q m ~ 1 
to denote the strategy specification of all the agents, except agent i, i.e., 7r_i := (7Ti,..., 7r»_i, 7r*+i,..., 7r m ). Hence, we may 
write strategy vector 7r as (7Tj,7r_»). Let Ui : Q m —> K be the utility function of vehicle i. For a given strategy vector 7r, let 
rfq, 7 r) be the reward collected by agent i for resource generated at location q £ Q. In light of the discussion above, the utility 
function of an agent i is defined as 

= E q [ri(q, 7r)]. ( 21 ) 


Equation (21) implies that the goal of every vehicle is to maximize the expected value of the reward from the next resource. 
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With this, we complete the formal description of the game at hand. For brevity in notation, we use Q to denote this game. We 
now derive a working expression for the utility function. 

As mentioned before, the reward for agent i is the time till the arrival of the second agent at point q if agent i is the 
first to reach q and zero otherwise. Since the vehicles move at unit speed, the reward for an agent i can be written as 
ri(q, n) = max{0, min^* ||Xj — q\\ — ||7r,: — g||}. The utility function for agent i can then be written as 

Ui(ni,7r-i) = Eq[max{0,min \\ttj - q\\ - ||7r.; - q\\}] = / max{0,min ||7r,- - g|| - ||7r* - q\\}<p(q)dq. (22) 

Jq 

However, we know that 

{ min^i ||t tj - g|| - ||7Tj - g||, if q £ V,(tt) 

0 , otherwise . 

Substituting this into Equation (22), we derive the following expression for the utility function. 

Ui^u-K-i) = / (min ||tTj - g|| - ||7T; - q\\)cp(q)dq. (23) 

JVi(TT) 2#* 

We now prove that Q belongs to a class of multi-player games called potential games. In a potential game, the difference 
in the value of the utility function for any agent for two different strategies, when the strategies of the other agents are kept 
fixed, is equal to the difference in the values of a potential function that depends only on the strategy vector and not on the 
label of any agent. The formal definition [34] is as follows. 

Definition 18: A finite ?n-player game with strategy spaces {rij}™ =1 and utility functions is a potential game if, 

for some potential function if : x ie r 1> —> R, 


Ui{ 7T-, 7T_i) - Ui{ 7r", 7T_i) = t/>(7t', 7T_ j) - , 7T_»)| 

for every player i £ m}, for every 7r_j £ and for every 7r', 7 r" £ 11,. 


Proposition 19: G is a potential game. 
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Proof: The expression for the utility function of agent i, as given by Equation (23), can be rewritten as: 


Ui^un-i) = / min \\trj - q\\ip(q)dq - / - q\\ip{q)dq 

JVi( 7r) *^Vi( 7r) 

m r. m r. 

+ J2 hj - q\\<p{q)dq~y] / Utt,-- 5M5)d5 

3 -.1 ^Vj(TT) j = l JVj(7r) 

r. m m r. 

= / minlkj - qWv^dq + V] / ||7rj - g||<p(g)dg - V / Htt,- - 5||^(5)<^5 

JViW 7T) 7T) 

« m n. m r. 

= / min ||7Tj - 5||y>(5)d5 + V / min ||7T fe - 5||<^(5)^5 - Y] / Ikj - 5ll<^(5)c?5 

JviW 3*' 

r. m r. 

= / min ||7Tj - 5||¥>(5)d5 ~ Y^ / IKj - 5|k(5)^5- 
JQ.1T* JVh(v) 


(24) 


7 = 1 (7r) 


In the integrand of the first term in Equation (24), min^ 17r ,■ — g|| is the distance from point q to the closest among all the 
agents, except the 1 th agent. We then consider the Voronoi partition with generators n = tt \ 7r,;, and let Vj{iT-i) be the 
corresponding Voronoi cell belonging to agent j. Equation (24) can then be written as 


m r. m 

Ui{-Ki,-K-i) = Y3 / hi -q\\y(q)dq- V / IKj - 5||V ? (<3')^5- 

Jh-'VjtTr f) j=rihW 


(25) 


Note that the first term on the right hand side of Equation (25) is independent of tt,. With this observation, consider the 
following potential function: 

m ^ 

~ (26) 


= / IKi-5M5)d5- 

i=i 


The proposition then follows by combining Equations (25) and (26) with the definition of a potential game. 


It turns out that an agent’s motive to maximize its own utility function is aligned with the global objective of minimizing 
the average system time. To formally establish this fact, we start with a couple of definitions. First, we extend the concept of 
a potential game. 

Definition 20: A finite ?n-player game with strategy spaces {IIj}^. 1 and utility functions {Ui} l fL 1 is an ordinal potential 
game if, for some potential function ft : —> R, 

U i ('K i ,-K-i) -Ui( 7r”,7r_j) > 0 if and only if _,) - V’( 7r i> 7r -*) > 

for every player i £ {1, ..., m}, for every 7 r_.; £ Xj^H and for every 7 r', tt” £ 11 ,. 

Remark 21: Note that every potential game is also an ordinal potential game. 

Definition 22: The set of agents utilities {Ui\i- is aligned with the global utility U g if and only if the game with 
utility functions {£4}i=i,...,m is an ordinal potential game with U g as a potential function. 
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For the game Q, define the global utility function to be the negative of the average system time under policy 7 r, i.e., 
Ugijr) = —T n , which can be rewritten as 

771 r. 

Ug{n) = \ Ik* - Q\\<P{Q)dq- ( 27 ) 

i=l • / V,(*r) 

Proposition 23: The utility functions of the agents in Q are aligned with its global utility function. 

Proof: Comparing Equation (27) with the expression of the potential function in Equation (26), we observe that U g { 7r) = 
ip (it). Proposition 19 implies that Q is a potential game, and hence an ordinal potential game, with ip as the potential function. 
The proposition then follows from Definition 22. ■ 

We now show that our utility function belongs to a class of utility functions called Wonderful Life Utility Functions [35], 
Definition 24: Given a global utility function W g (7r,, 7r_,), the Wonderful Life (local) utility function is given by 

(n i-7T—i ) lAg (7T.j, 71—f) l4 g (^7T_if 

where lA g (Ti—i) is the global utility in absence of agent i. 

Remark 25: The Wonderful Life utility function measures the marginal contribution of an agent towards the global utility. 
Proposition 26: The local utility function, as defined in Equation (25) is a Wonderful Life utility function with respect to 
the global utility function defined in Equation (27). 

Proof: We refer to Equation (25) in the proof of Proposition 19, where we derived an alternate expression for the local 
utility function. Comparing the two terms individually with the expression for the global utility function in Equation (27), we 
get that 

7T—.f) — 7T_j) lAg{lt—f). 

The proposition then follows from the definition of the Wonderful Life utility function. ■ 

The 7r nc policy yields the equilibrium strategy p* = {p\,. .. such that, for all i £ {1,..., m}, p* is the median of the 

Voronoi region Vffy ). We now state and prove results regarding the efficiency and equilibrium status of the p* strategy in the 

game theoretic setting of this section. We first state the following definition adapted from [8]: 

Definition 27: A strategy 7r is called a pure Nash equilibrium if, for all i = {1,..., m}, 

U,{tTi,T:-i) = m.&yiUi{-Ki,Tt-i). (28) 

7Tj g n j 

Moreover, a strategy 7r is called efficient if there is no other strategy that yields higher utilities to all the agents. 

Proposition 28: The p* strategy is an efficient pure Nash equilibrium for the game Q. 
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Proof: For any 7r_i = *_»}, 

p* = argmin eQ / || pi - q\\p{q)dq. 

JVi(p) 

Combining this with the definition of an efficient Nash equilibrium, one arrives at the proposition. ■ 

Remark 29: Note that we do not claim in Proposition 28 that the it* strategy provides the unique efficient pure Nash 
equilibrium for the game Q. For instance to find other efficient pure Nash equilibria it is sufficient to modify the algorithm 
during the initial phases, when the FT points are not uniquely determined. These modifications produce different policy vectors 
which are anyway all efficient pure Nash equilibria, as it is immediate to see. 

In general, it is true that alignment does not prevent pure Nash equilibria from being suboptimal from the point of view of 
the global utility. Moreover, even efficient pure Nash equilibria (i.e. pure Nash equilibria which yield the highest utility to all 
agents) can be suboptimal from the perspective of the global utility function. Such a phenomenon is indeed what happens in 
our construction. 

In the earlier sections, we proved that the update rule for the reference point which was part of both the policies converges 
to p* almost surely under light load conditions. That update rule can then be thought of a learning algorithm for the agents 
to arrive at the efficient Nash equilibrium p* , even without explicit knowledge of the history of policies played by the other 
agents at every iteration. 

VI. Numerical results 

In this section, we present simulation results showing the performance of the proposed policies for various scenarios. 


A. Uniform distribution, light load 


In the numerical experiments, we first consider m = 9, choose Q as a unit square, and set ip = 1 (i.e., we consider a spatially 
uniform target-generation process). This choice allows us to determine easily the optimal placement of reference points, at the 
centers of a tessellation of Q into nine equal squares, and compute analytically the optimal system time. In fact, it is known 
that the expected distance of a point q randomly sampled from a uniform distribution within a square of side L from the center 
of the square c is 


E[|k-c||] = 


\/2 + log(l + \/2 ) ^ 


0.3826 L. 


The results for a small value of A, i.e., A = 0.5, are presented in Figure 5. The average service time converges to a value 
that is very close to the theoretical limit computed above, taking L = 1 / ffiin = 1/3. In both cases, the reference points 
converge—albeit very slowly—to the generators of a MVT, while the average system time quickly approaches the optimal 


value. 
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East 



Fig. 5. Numerical simulation in the light-load case, for a uniform spatial distribution. Top left: the actual service times as a function of time, for the two 
policies, compared with the optimal system time. Top right: the initial configuration of the nine agents. Bottom left and right: paths followed by the reference 
points up to t = 10 4 (corresponding to approximately 5,000 targets), using the two policies. The locations of all targets visited by one of the agents are also 
shown. 

B. Non-uniform distribution, light load 

We also present in Figure 6 results of similar numerical experiments with a non-uniform distribution, namely an isotropic 
normal distribution centered at (0.25, 0.25), with standard deviation equal to 0.25. 

C. Uniform distribution, dependency on the target generation rate 

An interesting set of numerical experiments evaluates the performance of the proposed policies over a large range of values 
of the target generation rate A. In Section IV, we proved the convergence of the system’s behavior to an efficient steady 
state, with high probability as A —> 0 + , as confirmed by the simulations discussed above. For large values of A however, the 
assumption that vehicles are able to return to their reference point breaks down, and the convergence result is no longer valid. 
In figure 7 we report results from numerical experiments on scenarios involving m = 3 agents, and values of A ranging from 
1/2 to 32. In the figure, we also report the known (asymptotic) lower bounds on the system time (with 3 agents), as derived 
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No-communication policy t ( = 10000, X = 1/2, normal distribution 
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Fig. 6. Numerical simulation in the light-load case, for a normal spatial distribution. Top left: the actual service times as a function of time, for the two 
policies. Top right: the initial configuration of the nine agents. Bottom left and right: paths followed by the reference points up to t = 10 4 (corresponding to 
approximately 5,000 targets), using the two policies. The locations of all targets visited by one of the agents are also shown. 

in [11], and the system time obtained with the proposed policies in a single-agent scenario. 

The performance of both proposed policies is close to optimal for small A, as expected. The sensor-based policy behaves 
well over a large range of target generation rates; in fact, the numerical results suggest that the policy provides a system time 
that is a constant-factor approximation of the optimum, by a factor of approximately 1.6. 

However, as A increases, the performance of the no-communication policy degrades significantly, almost approaching the 
performance of a single-vehicle system over an intermediate range of values of A. Our intuition in this phenomenon is the 
following. As agents do not return to their own reference points between visiting successive targets, their efficiency decreases 
since they are no longer able to effectively separate regions of responsibility. In practice—unless they communicate and 
concentrate on their Voronoi region, as in the sensor-based policy—agents are likely to duplicate efforts as they pursue the 
same target, and effectively behave as a single-vehicle system. Interestingly, this efficiency loss seems to decrease for large A, 


and the numerical results suggest that the no-communication policy recovers a similar performance as the sensor-based policy 
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Fig. 7. System time provided by the policies proposed in this paper, as a function of the target generation rate A. The system is composed of three vehicles, 
and the target points are generated uniformly in the unit square. Each data point is obtained taking the mean of the results obtained running one hundred times 
the system dynamics, for each specified policy and for each specified target generation rate A. Therefore each data point is a Monte Carlo approximation of 
the theoretical system time corresponding to that policy and to that target generation rate. 

in the heavy load limit. Unfortunately, we are not able at this time to provide a rigorous analysis of the proposed policies for 
general values of the target generation rate. 

Finally, let us add two remarks. The first concerns what happens if the agents are slower. After a suitable rescaling, reducing 
the agent velocities is equivalent to increase the target generation rate A. As it is evident from Figure 7 even when A increases 
both policies are stabilizing, in the sense that the system time is finite in both cases. On the other hand, the system time is 
much smaller for the sensor based policy when A > 2.5, or equivalently when the agents velocities are decreased. Therefore 
the sensor based policy performs better for bigger values of A, that is to say for smaller agent velocities. 

The second remark deals with the following question: what happens if D(t) in not known, but only a delayed version of 
it is known to the agents? Assume that the agents know D(t — td), where td is a positive integrable random variable, that is 
assume that the delay with which the information about the set of outstanding targets reaches the agents, is a positive integrable 
random variable td- Assume moreover that E[td] = A. It is immediate to see that the presence of this random delay will affect 
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the performance of the system, namely the time of service, increasing it by an amount equal to A. So for instance, the plots 
in Figure 7 will shift upward of a quantity equal to A, whenever the delay is modeled in this way. 

VII. Conclusions 

In this paper we considered two very simple strategies for multiple vehicle routing in the presence of dynamically-generated 
targets, and analyzed their performance in light load conditions, i.e., when the target generation rate is very small. The strategies 
we addressed in this paper are based on minimal assumptions on the ability of the agents to exchange information: in one 
case they do not explicitly communicate at all, and in the other case, agents are only aware of other agents’ current location. 
A possibly unexpected and striking results of our analysis is the following: the collective performance of the agents using 
such minimal or no-communication strategies is (locally) optimal, and is in fact as good as that achieved by the best known 
decentralized strategies, see for instance [12] for a comparison. The proposed no-communication algorithm does converge to a 
(local) optimum, even though quite slowly compared to other policies. On the other hand, the proposed strategies do not rely 
on the knowledge of the target generation process, and make minimal assumptions on the target spatial distribution; in fact, the 
convexity and boundedness assumptions on the support of the spatial distribution can be relaxed, as long as path connectivity 
of the support, and absolute continuity of the distribution are ensured. Also, the distribution need not be constant: Indeed, the 
algorithm will provide a good approximation to a local optimum for the cost function as long as the characteristic time it takes 
for the target generation process to vary significantly is much greater than the relaxation time of the algorithm. In summary, 
the proposed strategies can be seen as a learning mechanism in which the agents learn the target generation process, and the 
ensuing target spatial distribution, and adapt their own behavior to it. 

The proposed strategies are very simple to implement, as they only require storage of the coordinates of points visited in 
the past and simple algebraic calculations; the “sensor-based” strategy also require a device to estimate the position of other 
agents. Simple implementation and the absence of active communication makes the proposed strategies attractive, for example, 
in embedded systems and stealthy applications. The game-theoretic interpretation of our results also provides some insight 
into how territorial, globally optimal behavior can arise in a population of selfish but rational individuals even without explicit 
mechanisms for territory marking and defense. 

While we were able to prove that the proposed strategies perform efficiently for small values of the target generation rate, 
little is known about their performance in other regimes. In particular, we have shown numerical evidence that suggests that the 
first strategy we introduced, requiring no communication, performs poorly when targets are generated very frequently, whereas 
the performance of the sensor-based strategy is in fact comparable to that of the best known strategies for the heavy load case. 

Extensions of this work will include the analysis and design of efficient strategies for general values of the target generation 
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rate, for different vehicle dynamics models (e.g., including differential constraints on the motion of the agents), and heteroge¬ 
neous systems in which both service requests and agents can belong to several different classes with different characteristics 
and abilities. Initial work on reactive coverage problems for multiple agents with various kinodynamic constraints has been 
reported in [36], [37], Another direction of research is to extend the game theoretic formulation for the problem in this paper 
to formally study various hunting and foraging strategies in animals [38], 
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